Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Front Big Data ; 7: 1346958, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38650693

RESUMEN

Introduction: Acupuncture and tuina, acknowledged as ancient and highly efficacious therapeutic modalities within the domain of Traditional Chinese Medicine (TCM), have provided pragmatic treatment pathways for numerous patients. To address the problems of ambiguity in the concept of Traditional Chinese Medicine (TCM) acupuncture and tuina treatment protocols, the lack of accurate quantitative assessment of treatment protocols, and the diversity of TCM systems, we have established a map-filling technique for modern literature to achieve personalized medical recommendations. Methods: (1) Extensive acupuncture and tuina data were collected, analyzed, and processed to establish a concise TCM domain knowledge base. (2)A template-free Chinese text NER joint training method (TemplateFC) was proposed, which enhances the EntLM model with BiLSTM and CRF layers. Appropriate rules were set for ERE. (3) A comprehensive knowledge graph comprising 10,346 entities and 40,919 relationships was constructed based on modern literature. Results: A robust TCM KG with a wide range of entities and relationships was created. The template-free joint training approach significantly improved NER accuracy, especially in Chinese text, addressing issues related to entity identification and tokenization differences. The KG provided valuable insights into acupuncture and tuina, facilitating efficient information retrieval and personalized treatment recommendations. Discussion: The integration of KGs in TCM research is essential for advancing diagnostics and interventions. Challenges in NER and ERE were effectively tackled using hybrid approaches and innovative techniques. The comprehensive TCM KG our built contributes to bridging the gap in TCM knowledge and serves as a valuable resource for specialists and non-specialists alike.

2.
Nat Commun ; 14(1): 7554, 2023 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-37985761

RESUMEN

Lunar surface chemistry is essential for revealing petrological characteristics to understand the evolution of the Moon. Existing chemistry mapping from Apollo and Luna returned samples could only calibrate chemical features before 3.0 Gyr, missing the critical late period of the Moon. Here we present major oxides chemistry maps by adding distinctive 2.0 Gyr Chang'e-5 lunar soil samples in combination with a deep learning-based inversion model. The inferred chemical contents are more precise than the Lunar Prospector Gamma-Ray Spectrometer (GRS) maps and are closest to returned samples abundances compared to existing literature. The verification of in situ measurement data acquired by Chang'e 3 and Chang'e 4 lunar rover demonstrated that Chang'e-5 samples are indispensable ground truth in mapping lunar surface chemistry. From these maps, young mare basalt units are determined which can be potential sites in future sample return mission to constrain the late lunar magmatic and thermal history.

3.
Front Genet ; 14: 1151962, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37205122

RESUMEN

The exploration of important biomarkers associated with cancer development is crucial for diagnosing cancer, designing therapeutic interventions, and predicting prognoses. The analysis of gene co-expression provides a systemic perspective on gene networks and can be a valuable tool for mining biomarkers. The main objective of co-expression network analysis is to discover highly synergistic sets of genes, and the most widely used method is weighted gene co-expression network analysis (WGCNA). With the Pearson correlation coefficient, WGCNA measures gene correlation, and uses hierarchical clustering to identify gene modules. The Pearson correlation coefficient reflects only the linear dependence between variables, and the main drawback of hierarchical clustering is that once two objects are clustered together, the process cannot be reversed. Hence, readjusting inappropriate cluster divisions is not possible. Existing co-expression network analysis methods rely on unsupervised methods that do not utilize prior biological knowledge for module delineation. Here we present a method for identification of outstanding modules in a co-expression network using a knowledge-injected semi-supervised learning approach (KISL), which utilizes apriori biological knowledge and a semi-supervised clustering method to address the issue existing in the current GCN-based clustering methods. To measure the linear and non-linear dependence between genes, we introduce a distance correlation due to the complexity of the gene-gene relationship. Eight RNA-seq datasets of cancer samples are used to validate its effectiveness. In all eight datasets, the KISL algorithm outperformed WGCNA when comparing the silhouette coefficient, Calinski-Harabasz index and Davies-Bouldin index evaluation metrics. According to the results, KISL clusters had better cluster evaluation values and better gene module aggregation. Enrichment analysis of the recognition modules demonstrated their effectiveness in discovering modular structures in biological co-expression networks. In addition, as a general method, KISL can be applied to various co-expression network analyses based on similarity metrics. Source codes for the KISL and the related scripts are available online at https://github.com/Mowonhoo/KISL.git.

4.
J Cardiovasc Transl Res ; 16(4): 896-904, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-36928587

RESUMEN

The visual inspection of coronary artery stenosis is known to be significantly affected by variation, due to the presence of other tissues, camera movements, and uneven illumination. More accurate and intelligent coronary angiography diagnostic models are necessary for improving the above problems. In this study, 2980 medical images from 949 patients are collected and a novel deep learning-based coronary angiography (DLCAG) diagnose system is proposed. Firstly, we design a module of coronary classification. Then, we introduce RetinaNet to balance positive and negative samples and improve the recognition accuracy. Additionally, DLCAG adopts instance segmentation to segment the stenosis of vessels and depict the degree of the stenosis vessels. Our DLCAG is available at http://101.132.120.184:8077/ . When doctors use our system, all they need to do is login to the system, upload the coronary angiography videos. Then, a diagnose report is automatically generated.


Asunto(s)
Estenosis Coronaria , Aprendizaje Profundo , Humanos , Angiografía Coronaria/métodos , Constricción Patológica , Estenosis Coronaria/diagnóstico por imagen , Corazón , Vasos Coronarios/diagnóstico por imagen , Angiografía por Tomografía Computarizada/métodos
5.
Sci Rep ; 13(1): 2, 2023 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-36593288

RESUMEN

More and more people are under high pressure in modern society, leading to growing mental disorders, such as antenatal depression for pregnant women. Antenatal depression can affect pregnant woman's physical and psychological health and child outcomes, and cause postpartum depression. Therefore, it is essential to detect the antenatal depression of pregnant women early. This study aims to predict pregnant women's antenatal depression and identify factors that may lead to antenatal depression. First, a questionnaire was designed, based on the daily life of pregnant women. The survey was conducted on pregnant women in a hospital, where 5666 pregnant women participated. As the collected data is unbalanced and has high dimensions, we developed a one-class classifier named Stacked Auto Encoder Support Vector Data Description (SAE-SVDD) to distinguish depressed pregnant women from normal ones. To validate the method, SAE-SVDD was firstly applied on three benchmark datasets. The results showed that SAE-SVDD was effective, with its F-scores better than other popular classifiers. For the antenatal depression problem, the F-score of SAE- SVDD was higher than 0.87, demonstrating that the questionnaire is informative and the classification method is successful. Then, by an improved Term Frequency-Inverse Document Frequency (TF-IDF) analysis, the critical factors of antenatal depression were identified as work stress, marital status, husband support, passive smoking, and alcohol consumption. With its generalizability, SAE-SVDD can be applied to analyze other questionnaires.


Asunto(s)
Complicaciones del Embarazo , Mujeres Embarazadas , Femenino , Humanos , Embarazo , Consumo de Bebidas Alcohólicas , Estado Civil , Complicaciones del Embarazo/diagnóstico , Mujeres Embarazadas/psicología , Encuestas y Cuestionarios
6.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35596953

RESUMEN

Coronavirus disease 2019 (COVID-19) has infected hundreds of millions of people and killed millions of them. As an RNA virus, COVID-19 is more susceptible to variation than other viruses. Many problems involved in this epidemic have made biosafety and biosecurity (hereafter collectively referred to as 'biosafety') a popular and timely topic globally. Biosafety research covers a broad and diverse range of topics, and it is important to quickly identify hotspots and trends in biosafety research through big data analysis. However, the data-driven literature on biosafety research discovery is quite scant. We developed a novel topic model based on latent Dirichlet allocation, affinity propagation clustering and the PageRank algorithm (LDAPR) to extract knowledge from biosafety research publications from 2011 to 2020. Then, we conducted hotspot and trend analysis with LDAPR and carried out further studies, including annual hot topic extraction, a 10-year keyword evolution trend analysis, topic map construction, hot region discovery and fine-grained correlation analysis of interdisciplinary research topic trends. These analyses revealed valuable information that can guide epidemic prevention work: (1) the research enthusiasm over a certain infectious disease not only is related to its epidemic characteristics but also is affected by the progress of research on other diseases, and (2) infectious diseases are not only strongly related to their corresponding microorganisms but also potentially related to other specific microorganisms. The detailed experimental results and our code are available at https://github.com/KEAML-JLU/Biosafety-analysis.


Asunto(s)
COVID-19 , Bioaseguramiento , COVID-19/epidemiología , Contención de Riesgos Biológicos/métodos , Humanos , Aprendizaje Automático , ARN
7.
Entropy (Basel) ; 23(3)2021 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-33809188

RESUMEN

Machine learning models can automatically discover biomedical research trends and promote the dissemination of information and knowledge. Text feature representation is a critical and challenging task in natural language processing. Most methods of text feature representation are based on word representation. A good representation can capture semantic and structural information. In this paper, two fusion algorithms are proposed, namely, the Tr-W2v and Ti-W2v algorithms. They are based on the classical text feature representation model and consider the importance of words. The results show that the effectiveness of the two fusion text representation models is better than the classical text representation model, and the results based on the Tr-W2v algorithm are the best. Furthermore, based on the Tr-W2v algorithm, trend analyses of cancer research are conducted, including correlation analysis, keyword trend analysis, and improved keyword trend analysis. The discovery of the research trends and the evolution of hotspots for cancers can help doctors and biological researchers collect information and provide guidance for further research.

8.
Nat Commun ; 11(1): 6358, 2020 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-33353954

RESUMEN

Impact craters, which can be considered the lunar equivalent of fossils, are the most dominant lunar surface features and record the history of the Solar System. We address the problem of automatic crater detection and age estimation. From initially small numbers of recognized craters and dated craters, i.e., 7895 and 1411, respectively, we progressively identify new craters and estimate their ages with Chang'E data and stratigraphic information by transfer learning using deep neural networks. This results in the identification of 109,956 new craters, which is more than a dozen times greater than the initial number of recognized craters. The formation systems of 18,996 newly detected craters larger than 8 km are estimated. Here, a new lunar crater database for the mid- and low-latitude regions of the Moon is derived and distributed to the planetary community together with the related data analysis.

10.
BMC Med Genet ; 21(1): 65, 2020 03 30.
Artículo en Inglés | MEDLINE | ID: mdl-32228543

RESUMEN

BACKGROUND: Several obesity susceptibility loci in genes, including GNPDA2, SH2B1, TMEM18, MTCH2, CDKAL1, FAIM2, and MC4R, have been identified by genome-wide association studies. The purpose of this study was to investigate whether these loci are associated with the concurrence of obesity and type 2 diabetes in Chinese Han patients. METHODS: Using the SNaPshot technique, we genotyped seven single nucleotide polymorphisms (SNPs) in 439 Chinese patients living in Northeast China who presented at The Second Hospital of Jilin University. We analyzed the associations between these seven alleles and clinical characteristics. RESULTS: Risk alleles near TMEM18 (rs6548238) were associated with increased waist circumference, waist/hip ratio, body mass index (BMI), fasting plasma glucose, hemoglobin A1c, diastolic blood pressure, triglycerides, total cholesterol, and low-density lipoprotein-cholesterol; risk alleles of CDKAL1 (rs7754840) were associated with increased waist circumference and waist/hip ratio; and FAIM2 (rs7138803) risk alleles were linked to increased BMI, diastolic blood pressure, and triglycerides (all P < 0.05). After adjusting for sex and age, loci near TMEM18 (rs6548238) and FAIM2 (rs7138803), but not SH2B1 (rs7498665), near GNPDA2 (rs10938397), MTCH2 (rs10838738) and near MC4R (rs12970134), were associated with increased risk for type 2 diabetes in obese individuals. CONCLUSION: We found that loci near TMEM18 (rs6548238), CDKAL1 (rs7754840), and FAIM2 (rs7138803) may be associated with obesity-related indicators, and loci near TMEM18 (rs6548238) and FAIM2 (rs7138803) may increase susceptibility of concurrent type 2 diabetes associated with obesity.


Asunto(s)
Proteínas Reguladoras de la Apoptosis/genética , Diabetes Mellitus Tipo 2/genética , Sitios Genéticos , Proteínas de la Membrana/genética , Obesidad/genética , ARNt Metiltransferasas/genética , Adolescente , Adulto , Anciano , Pueblo Asiatico/etnología , Pueblo Asiatico/genética , Estudios de Casos y Controles , China/epidemiología , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/etnología , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Obesidad/complicaciones , Obesidad/etnología , Polimorfismo de Nucleótido Simple , Adulto Joven
11.
Int J Mol Sci ; 21(6)2020 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-32235704

RESUMEN

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.


Asunto(s)
Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Animales , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Ratones , Transcriptoma
12.
Int J Biol Sci ; 15(10): 2065-2074, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31592230

RESUMEN

About 29.8 million people worldwide had been diagnosed with Alzheimer's disease (AD) in 2015, and the number is projected to triple by 2050. In 2018, AD was the fifth leading cause of death in Americans with 65 years of age or older, but the progress of AD drug research is very limited. It is helpful to identify the key factors and research trends of AD for guiding further more effective studies. We proposed a framework named as LDAP, which combined the latent Dirichlet allocation model and affinity propagation algorithm to extract research topics from 95,876 AD-related papers published from 2007 to 2016. Trends and hotspots analyses were performed on LDAP results. We found that the focus points of AD research for the past 10 years include 15 diseases, 15 amino acids, peptides, and proteins, 9 enzymes and coenzymes, 7 hormones, 7 carbohydrates, 5 lipids, 2 organophosphonates, 18 chemicals, 11 compounds, 13 symptoms, and 20 phenomena. Our LDAP framework allowed us to trace the evolution of research trends and the most popular areas of interest (hotspots) on disease, protein, symptom, and phenomena. Meanwhile, 556 AD related-genes were identified, which are enriched in 12 KEGG pathways including the AD pathway and nitrogen metabolism pathway. Our results are freely available at https://www.keaml.cn/Alzheimer.


Asunto(s)
Enfermedad de Alzheimer , Investigación Biomédica/tendencias , Aprendizaje Automático , Humanos , PubMed , Estados Unidos
13.
J Med Internet Res ; 21(5): e12957, 2019 05 24.
Artículo en Inglés | MEDLINE | ID: mdl-31127715

RESUMEN

BACKGROUND: It is of great importance for researchers to publish research results in high-quality journals. However, it is often challenging to choose the most suitable publication venue, given the exponential growth of journals and conferences. Although recommender systems have achieved success in promoting movies, music, and products, very few studies have explored recommendation of publication venues, especially for biomedical research. No recommender system exists that can specifically recommend journals in PubMed, the largest collection of biomedical literature. OBJECTIVE: We aimed to propose a publication recommender system, named Pubmender, to suggest suitable PubMed journals based on a paper's abstract. METHODS: In Pubmender, pretrained word2vec was first used to construct the start-up feature space. Subsequently, a deep convolutional neural network was constructed to achieve a high-level representation of abstracts, and a fully connected softmax model was adopted to recommend the best journals. RESULTS: We collected 880,165 papers from 1130 journals in PubMed Central and extracted abstracts from these papers as an empirical dataset. We compared different recommendation models such as Cavnar-Trenkle on the Microsoft Academic Search (MAS) engine, a collaborative filtering-based recommender system for the digital library of the Association for Computing Machinery (ACM) and CiteSeer. We found the accuracy of our system for the top 10 recommendations to be 87.0%, 22.9%, and 196.0% higher than that of MAS, ACM, and CiteSeer, respectively. In addition, we compared our system with Journal Finder and Journal Suggester, which are tools of Elsevier and Springer, respectively, that help authors find suitable journals in their series. The results revealed that the accuracy of our system was 329% higher than that of Journal Finder and 406% higher than that of Journal Suggester for the top 10 recommendations. Our web service is freely available at https://www.keaml.cn:8081/. CONCLUSIONS: Our deep learning-based recommender system can suggest an appropriate journal list to help biomedical scientists and clinicians choose suitable venues for their papers.


Asunto(s)
Aprendizaje Profundo/tendencias , Investigación Biomédica , Humanos , Publicaciones , Estudios de Validación como Asunto
14.
J Ambient Intell Humaniz Comput ; 10(5): 2029-2040, 2019 May.
Artículo en Inglés | MEDLINE | ID: mdl-31068980

RESUMEN

With the massive volume and rapid increasing of data, feature space study is of great importance. To avoid the complex training processes in deep learning models which project original feature space into low-dimensional ones, we propose a novel feature space learning (FSL) model. The main contributions in our approach are: (1) FSL can not only select useful features but also adaptively update feature values and span new feature spaces; (2) four FSL algorithms are proposed with the feature space updating procedure; (3) FSL can provide a better data understanding and learn descriptive and compact feature spaces without the tough training for deep architectures. Experimental results on benchmark data sets demonstrate that FSL-based algorithms performed better than the classical unsupervised, semi-supervised learning and even incremental semi-supervised algorithms. In addition, we show a visualization of the learned feature space results. With the carefully designed learning strategy, FSL dynamically disentangles explanatory factors, depresses the noise accumulation and semantic shift, and constructs easy-to-understand feature spaces.

15.
BMC Syst Biol ; 13(1): 13, 2019 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-30670065

RESUMEN

It was highlighted that the original article [1] contained a typesetting error in the last name of Allon Canaan. This was incorrectly captured as Allon Canaann in the original article which has since been updated.

16.
Neural Process Lett ; 50(1): 103-119, 2019 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35035261

RESUMEN

Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric.

17.
BMC Syst Biol ; 12(Suppl 7): 114, 2018 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-30547798

RESUMEN

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data. METHODS: To solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements. RESULTS: We compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary. CONCLUSIONS: Our results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data.


Asunto(s)
Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual , Humanos , Leucemia Mielógena Crónica BCR-ABL Positiva/genética , Leucemia Mielógena Crónica BCR-ABL Positiva/patología , Células Madre Neoplásicas/metabolismo , Células Madre Neoplásicas/patología
18.
BMC Syst Biol ; 12(Suppl 7): 116, 2018 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-30547805

RESUMEN

BACKGROUND: Nowadays, because of the huge economic burden on society causing by obesity and diabetes, they turn into the most serious public health challenges in the world. To reveal the close and complex relationships between diabetes, obesity and other diseases, search the effective treatment for them, a novel model named as representative latent Dirichlet allocation (RLDA) topic model is presented. RESULTS: RLDA was applied to a corpus of more than 337,000 literatures of diabetes and obesity which were published from 2007 to 2016. To unveil those meaningful relationships between diabetes mellitus, obesity and other diseases, we performed an explicit analysis on the output of our model with a series of visualization tools. Then, with the clinical reports which were not used in the training data to show the credibility of our discoveries, we find that a sufficient number of these records are matched directly. Our results illustrate that in the last 10 years, for obesity accompanying diseases, scientists and researchers mainly focus on 17 of them, such as asthma, gastric disease, heart disease and so on; for the study of diabetes mellitus, it features a more broad scope of 26 diseases, such as Alzheimer's disease, heart disease and so forth; for both of them, there are 15 accompanying diseases, listed as following: adrenal disease, anxiety, cardiovascular disease, depression, heart disease, hepatitis, hypertension, hypothalamic disease, respiratory disease, myocardial infarction, OSAS, liver disease, lung disease, schizophrenia, tuberculosis. In addition, tumor necrosis factor, tumor, adolescent obesity or diabetes, inflammation, hypertension and cell are going be the hot topics related to diabetes mellitus and obesity in the next few years. CONCLUSIONS: With the help of RLDA, the hotspots analysis-relation discovery results on diabetes and obesity were achieved. We extracted the significant relationships between them and other diseases such as Alzheimer's disease, heart disease and tumor. It is believed that the new proposed representation learning algorithm can help biomedical researchers better focus their attention and optimize their research direction.


Asunto(s)
Biología Computacional/métodos , Complicaciones de la Diabetes , Obesidad/complicaciones , Algoritmos , PubMed
19.
BMC Med Genomics ; 11(Suppl 5): 106, 2018 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-30453959

RESUMEN

BACKGROUND: Non-small cell lung cancer (NSCLC) represents more than about 80% of the lung cancer. The early stages of NSCLC can be treated with complete resection with a good prognosis. However, most cases are detected at late stage of the disease. The average survival rate of the patients with invasive lung cancer is only about 4%. Adenocarcinoma in situ (AIS) is an intermediate subtype of lung adenocarcinoma that exhibits early stage growth patterns but can develop into invasion. METHODS: In this study, we used RNA-seq data from normal, AIS, and invasive lung cancer tissues to identify a gene module that represents the distinguishing characteristics of AIS as AIS-specific genes. Two differential expression analysis algorithms were employed to identify the AIS-specific genes. Then, the subset of the best performed AIS-specific genes for the early lung cancer prediction were selected by random forest. Finally, the performances of the early lung cancer prediction were assessed using random forest, support vector machine (SVM) and artificial neural networks (ANNs) on four independent early lung cancer datasets including one tumor-educated blood platelets (TEPs) dataset. RESULTS: Based on the differential expression analysis, 107 AIS-specific genes that consisted of 93 protein-coding genes and 14 long non-coding RNAs (lncRNAs) were identified. The significant functions associated with these genes include angiogenesis and ECM-receptor interaction, which are highly related to cancer development and contribute to the smoking-free lung cancers. Moreover, 12 of the AIS-specific lncRNAs are involved in lung cancer progression by potentially regulating the ECM-receptor interaction pathway. The feature selection by random forest identified 20 of the AIS-specific genes as early stage lung cancer signatures using the dataset obtained from The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples. Of the 20 signatures, two were lncRNAs, BLACAT1 and CTD-2527I21.15 which have been reported to be associated with bladder cancer, colorectal cancer and breast cancer. In blind classification for three independent tissue sample datasets, these signature genes consistently yielded about 98% accuracy for distinguishing early stage lung cancer from normal cases. However, the prediction accuracy for the blood platelets samples was only 64.35% (sensitivity 78.1%, specificity 50.59%, and AUROC 0.747). CONCLUSIONS: The comparison of AIS with normal and invasive tumor revealed diseases-specific genes and offered new insights into the mechanism underlying AIS progression into an invasive tumor. These genes can also serve as the signatures for early diagnosis of lung cancer with high accuracy. The expression profile of gene signatures identified from tissue cancer samples yielded remarkable early cancer prediction for tissues samples, however, relatively lower accuracy for boold platelets samples.


Asunto(s)
Adenocarcinoma in Situ/patología , Neoplasias Pulmonares/patología , Adenocarcinoma in Situ/genética , Área Bajo la Curva , Bases de Datos Genéticas , Progresión de la Enfermedad , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/genética , Aprendizaje Automático , Estadificación de Neoplasias , Sistemas de Lectura Abierta/genética , ARN Largo no Codificante/genética , Curva ROC , Transcriptoma
20.
BMC Med Genomics ; 11(Suppl 5): 104, 2018 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-30454048

RESUMEN

BACKGROUND: Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival. METHOD: Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index). RESULTS: We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p <  0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p <  0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p <  0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis. CONCLUSIONS: Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.


Asunto(s)
Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Femenino , Factores de Transcripción Forkhead/genética , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Humanos , Mutación , Modelos de Riesgos Proporcionales , Receptor de Melatonina MT2/genética , Análisis de Supervivencia , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...